Skip to content

feat: Feather (Arrow IPC) positional point-lookup sidecar#28

Merged
anoop-narang merged 5 commits into
mainfrom
poc/702-feather-sidecar
Jun 17, 2026
Merged

feat: Feather (Arrow IPC) positional point-lookup sidecar#28
anoop-narang merged 5 commits into
mainfrom
poc/702-feather-sidecar

Conversation

@anoop-narang

@anoop-narang anoop-narang commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Adds FeatherLookupProvider + FeatherSidecarBuilder behind a new feather-provider feature — an Arrow/Feather (Arrow IPC) positional payload sidecar served through the existing PointLookupProvider trait, as a sibling to the SQLite and Parquet providers. Targets the DuckLake vector-search path (sparse, holey rowids).

What it does

  • Stores the payload as an Arrow IPC file sorted ascending by rowid.
  • Read (mmap): the file is memory-mapped and decoded zero-copy (FileDecoder over a Buffer backed by the mapping). A first-key-per-batch coarse index narrows a lookup to one batch; partition_point binary-searches that batch's rowid column; an exact-match guard rejects a missing rowid; take pulls the matched rows. A scattered take faults in only the touched pages, so resident heap is bounded by the working set — memory-comparable to SQLite paging its B-tree (no full-file load).
  • Build (bounded): an external spill-sort. push_batch streams rows UNSORTED to a temp IPC file (O(one batch) payload resident), keeping only an O(N)-key index in memory; finish argsorts the keys and gathers rows in sorted order from the mmap'd temp via interleave, in bounded chunks, into a temp file that is atomically renamed into place. No full-payload buffering and no dependency on the input scan's row order (DuckLake rowid scan order is not reliable).

Why

  • Drops the rusqlite/bundled-libsqlite3 + serde_json tax for this path.
  • Type fidelity: stores Arrow verbatim — round-trips Decimal / Struct / Map / FixedSizeList / Dictionary / Timestamp(tz) / nested List, which the SQLite provider rejects or coerces.
  • Speed: Feather + sorted-key binary search beats SQLite 3–6× at K=1000 for scattered sparse keys (microbench); build is bulk-columnar.

Tests

SQLite-parity on supported types (sparse-rowid key sets incl. edges, duplicates, absent keys, empty; single- and multi-batch with cross-boundary keys), lossless round-trip of the SQLite-rejected types, projection (graceful out-of-range error), sort-agnostic/shuffled build, open() rejection paths (unsorted/duplicate/malformed/missing/corrupt-footer), TableProvider::scan, multiple key types, and empty/single-row/exact-chunk-boundary builds. Also verified end-to-end on a live server against a real DuckLake snapshot.

Known limitation (for review)

Dictionary payload columns whose dictionary differs across input batches are rejected at build time (the IPC writer requires a consistent dictionary across the spilled batches) rather than silently corrupted — documented in the module header. Single-dictionary (the common case) round-trips fine.

Scope / follow-ups

Compress-at-rest + expand-on-NVMe for storage/egress; mount demand-paging / partial S3-mount fetch; removing the sqlite-provider feature/rusqlite dep once the parquet backend (its only remaining user) is removed.

FeatherLookupProvider + FeatherSidecarBuilder behind a feather-provider
feature: a sparse-rowid positional sidecar for DuckLake vector search,
served through the existing PointLookupProvider trait.

Hydration is binary-search (partition_point) over the sorted rowid column
plus an exact-match guard, then take(positions) on verbatim Arrow columns
- no SQLite/serde_json dep and no Arrow<->SQL type mapping. The builder
sorts at finish, so correctness does not depend on the DuckLake scan order.

Tests cross-check row/key/order/projection parity against the SQLite
provider on the supported types, and verify lossless round-trip of types
SQLite rejects (Decimal/FixedSizeList/Struct/Dictionary/nested List).
finish() sorted rows via sort_to_indices on the raw key column (native,
i.e. signed, order for Int64/Int32), while the open-time verification and
fetch_by_keys binary-search compare keys cast as u64. The two orderings
diverge for keys whose i64 and u64 order differ (e.g. negative Int64), so
a genuinely-ascending build was rejected by from_batches as 'corrupt'.

Argsort the u64-cast keys instead, so the on-disk order, the open-time
sortedness check, and the search all use one domain for every supported
key type. Tighten the sortedness guard to strictly-ascending (reject
duplicate keys, matching SQLite's INTEGER PRIMARY KEY) and factor the
null-checked u64 key extraction into a shared helper. Drop stray PoC
framing from the module docs.
Add tests for the open() rejection paths (unsorted / duplicate keys,
malformed and missing files), the TableProvider::scan round-trip, UInt64
key columns, and a regression test that negative-but-ascending Int64 keys
build and round-trip through the u64 lookup domain.
Comment thread src/feather_provider.rs Outdated
Comment on lines +187 to +207
let col_indices: Vec<usize> = match projection {
None => (0..self.schema.fields().len()).collect(),
Some(idxs) => idxs.to_vec(),
};
let out_schema: SchemaRef = match projection {
None => self.schema.clone(),
Some(idxs) => Arc::new(arrow_schema::Schema::new(
idxs.iter()
.map(|&i| self.schema.field(i).clone())
.collect::<Vec<_>>(),
)),
};

let pos_arr = UInt64Array::from(positions);
let cols: Vec<ArrayRef> = col_indices
.iter()
.map(|&i| {
compute::take(self.batch.column(i).as_ref(), &pos_arr, None)
.map_err(|e| DataFusionError::ArrowError(Box::new(e), None))
})
.collect::<DFResult<_>>()?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: an out-of-range projection index panics here (self.schema.field(i) and self.batch.column(i) index-out-of-bounds), whereas the sibling SqliteLookupProvider/HashKeyProvider use RecordBatch::project, which returns a DataFusionError instead. Projections are planner-generated and always valid in normal operation, so this isn't reachable today — but matching the siblings' graceful-error behavior would be more defensive. (not blocking)

claude[bot]
claude Bot previously approved these changes Jun 17, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the Feather/Arrow IPC point-lookup provider. The u64-domain consistency fix (build sort / open-time check / search all in one domain) is sound and well-justified, and test coverage is strong — SQLite parity, type fidelity, sort-agnostic build, open() rejection paths, and key-type variants. One non-blocking super nit left inline. LGTM.

Note: I was unable to run cargo test/clippy in the review environment (command approval-gated), so this approval is based on code inspection; please ensure the pre-push checklist passes.

Make FeatherLookupProvider memory-comparable to the SQLite sidecar (which
pages its B-tree) instead of fully resident:

- READ: mmap the file and decode Arrow IPC zero-copy (FileDecoder over a
  Buffer backed by the mapping). Column buffers point into the mapped file,
  so a scattered take() faults in only the touched pages; only the K-row
  output is heap-allocated. A first-key-per-batch coarse index narrows each
  lookup to one (now multi-batch) file segment before the in-batch binary
  search.

- BUILD: bounded-memory external sort. push_batch spills rows UNSORTED to a
  temp IPC file (O(one batch) payload resident), keeping only an O(N)*8B key
  index in memory; finish argsorts the keys and gathers rows in sorted order
  from the mmap'd temp via interleave, in bounded chunks, into the final
  sorted file. No full-payload buffering and no dependency on the input
  scan's row order (the DuckLake rowid scan order is not reliable).

Also: reject out-of-range projection indices with a DataFusionError instead
of panicking (matches the sibling providers; addresses the PR #28 nit), and
reject duplicate keys at build/open. Adds a multi-batch shuffled-build
parity test exercising the coarse index + cross-batch gather. New deps:
memmap2, tempfile.
Address the mmap/spill-rewrite review:

- open(): guard the footer length against the file size and range-check every
  IPC block offset/length before slicing, so a corrupt/foreign file returns a
  typed error instead of panicking (no-panic-on-open contract). Must-fix.
- finish(): publish via write-to-sibling-temp + atomic rename instead of
  truncating the final path in place, so a concurrent reader's live mmap never
  observes a truncated inode (SIGBUS/torn reads); also crash-safe.
- finish(): use usize sort indices (drops the >4.29B-row u32 truncation).
- Docs: correct the build-memory wording (u64 keys + usize permutation, not
  payload), the read-path heap-vs-page-in nuance, and note the single-dictionary
  constraint for Dictionary payload columns.
- Tests: corrupt-footer rejection, empty/single-row builds, exact chunk-boundary
  gather, and cross-push duplicate-key rejection.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the mmap/IPC decode path, the sorted-key lookup, the spill-sort build, and the test suite. Length/offset guards, u64 ordering-domain consistency, key/row alignment, and the atomic publish all check out. The prior projection-panic super-nit is addressed. Tests are thorough and meaningful. LGTM.

@anoop-narang anoop-narang merged commit ad53b11 into main Jun 17, 2026
6 checks passed
@anoop-narang anoop-narang deleted the poc/702-feather-sidecar branch June 17, 2026 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant